In [1]:

    
%matplotlib inline
import sys
print(sys.version)
import numpy as np
print(np.__version__)
import pandas as pd
print(pd.__version__)
import matplotlib.pyplot as plt









    



3.3.2 (v3.3.2:d047928ae3f6, May 13 2013, 13:52:24) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
1.9.2
0.16.2

Alright in this section we're going to continue with the running data set but we're going to dive a bit deeper into ways of analyzing the data including filtering, dropping rows, doing some groupings and that sort of thing.

So what we'll do is read in our csv file.



In [7]:

    
pd.read_csv?



In [8]:

    
list(range(1,7))









    Out[8]:





[1, 2, 3, 4, 5, 6]



In [13]:

    
df = pd.read_csv('../data/date_fixed_running_data_with_time.csv', parse_dates=['Date'], usecols=list(range(0,6)))



In [14]:

    
df.dtypes









    Out[14]:





Date            datetime64[ns]
Miles                  float64
Time                    object
Seconds                float64
Minutes                float64
Min_per_mile           float64
dtype: object



In [15]:

    
df.sort(inplace=True)
df.head()









    Out[15]:






  
    
      
      Date
      Miles
      Time
      Seconds
      Minutes
      Min_per_mile
    
  
  
    
      0
      2014-08-02
      1.70
      NaN
      NaN
      NaN
      NaN
    
    
      1
      2014-08-03
      1.40
      NaN
      NaN
      NaN
      NaN
    
    
      2
      2014-08-04
      2.10
      NaN
      NaN
      NaN
      NaN
    
    
      3
      2014-08-05
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      4
      2014-08-06
      2.38
      NaN
      NaN
      NaN
      NaN

Now let's about getting some summary statistics. I would encourage you to try these on your own. We've learned pretty much everything we need to in order to be able to do these on without guidance and as always if you need clarification just ask on the side.

What was the longest run in miles and minutes that I ran?



In [16]:

    
df.Minutes.max()









    Out[16]:





260.71666666666664



In [17]:

    
df.Miles.max()









    Out[17]:





26.199999999999999

What about the shortest in miles and minutes that I ran?



In [18]:

    
df.Minutes.min()









    Out[18]:





6.2666666666666675



In [19]:

    
df.Miles.min()









    Out[19]:





0.0

We forgot to ignore our null values, so how would we do it by ignoring those?



In [20]:

    
df.Miles[df.Miles > 0].min()









    Out[20]:





0.65000000000000002

What was the most common running distance I did excluding times when I didn't run at all.



In [21]:

    
df.Miles[df.Miles > 0].value_counts().index[0]









    Out[21]:





1.3700000000000001

Plot a graph of the cumulative running distance in this dataset.



In [22]:

    
df.Miles.cumsum().plot()
plt.xlabel("Day Number")
plt.ylabel("Distance")









    Out[22]:





<matplotlib.text.Text at 0x107a04d10>

Plot a graph of the cumulative running hours in this data set.



In [23]:

    
(df.Minutes.fillna(0).cumsum() / 60).plot()









    Out[23]:





<matplotlib.axes._subplots.AxesSubplot at 0x107aa8890>

Another interesting question we could ask is what days of the week do I commonly go for runs. Am I faster on certain days or does my speed improve over time relative to the distance that I'm running.

So let's get our days of the week



In [24]:

    
df.Date[0].strftime("%A")









    Out[24]:





'Saturday'

We will do that by mapping our date column to a the time format we need



In [25]:

    
df.Date.map(lambda x: x.strftime("%A")).head()









    Out[25]:





0     Saturday
1       Sunday
2       Monday
3      Tuesday
4    Wednesday
Name: Date, dtype: object

then we just set that to a new column.



In [26]:

    
df['Day_of_week'] = df.Date.map(lambda x: x.strftime("%A"))



In [27]:

    
df.head(10)









    Out[27]:






  
    
      
      Date
      Miles
      Time
      Seconds
      Minutes
      Min_per_mile
      Day_of_week
    
  
  
    
      0
      2014-08-02
      1.70
      NaN
      NaN
      NaN
      NaN
      Saturday
    
    
      1
      2014-08-03
      1.40
      NaN
      NaN
      NaN
      NaN
      Sunday
    
    
      2
      2014-08-04
      2.10
      NaN
      NaN
      NaN
      NaN
      Monday
    
    
      3
      2014-08-05
      0.00
      NaN
      NaN
      NaN
      NaN
      Tuesday
    
    
      4
      2014-08-06
      2.38
      NaN
      NaN
      NaN
      NaN
      Wednesday
    
    
      5
      2014-08-07
      2.30
      NaN
      NaN
      NaN
      NaN
      Thursday
    
    
      6
      2014-08-08
      3.40
      NaN
      NaN
      NaN
      NaN
      Friday
    
    
      7
      2014-08-09
      2.50
      NaN
      NaN
      NaN
      NaN
      Saturday
    
    
      8
      2014-08-10
      3.36
      0:28:37
      1717
      28.616667
      8.516865
      Sunday
    
    
      9
      2014-08-11
      1.67
      0:13:07
      787
      13.116667
      7.854291
      Monday

and we can make a bar plot of it, but let's see if we can distinguish anything unique about certain days of the week.



In [28]:

    
df[df.Miles > 0].Day_of_week.value_counts().plot(kind='bar')









    Out[28]:





<matplotlib.axes._subplots.AxesSubplot at 0x107c09f90>

We will do that by creating groups of data frames

We can see that in this sample I run a lot more on the Friday Saturday and Monday. Some interesting patterns. Why don't we try looking at the means and that sort of thing.

But before we get there, at this point, our data frame is getting pretty messy and I think it's worth explaining how to remove columns and add remove rows and columns.

First let's remove the Time column - seeing as we already have minutes and seconds



In [29]:

    
del(df['Time'])

del will delete it in place



In [30]:

    
df.head()









    Out[30]:






  
    
      
      Date
      Miles
      Seconds
      Minutes
      Min_per_mile
      Day_of_week
    
  
  
    
      0
      2014-08-02
      1.70
      NaN
      NaN
      NaN
      Saturday
    
    
      1
      2014-08-03
      1.40
      NaN
      NaN
      NaN
      Sunday
    
    
      2
      2014-08-04
      2.10
      NaN
      NaN
      NaN
      Monday
    
    
      3
      2014-08-05
      0.00
      NaN
      NaN
      NaN
      Tuesday
    
    
      4
      2014-08-06
      2.38
      NaN
      NaN
      NaN
      Wednesday

Finally we can use drop to drop a column. Now we have to specify the axis( we can also use this to drop rows), now this does not happen in place.



In [31]:

    
df.drop('Seconds',axis=1)









    Out[31]:






  
    
      
      Date
      Miles
      Minutes
      Min_per_mile
      Day_of_week
    
  
  
    
      0
      2014-08-02
      1.70
      NaN
      NaN
      Saturday
    
    
      1
      2014-08-03
      1.40
      NaN
      NaN
      Sunday
    
    
      2
      2014-08-04
      2.10
      NaN
      NaN
      Monday
    
    
      3
      2014-08-05
      0.00
      NaN
      NaN
      Tuesday
    
    
      4
      2014-08-06
      2.38
      NaN
      NaN
      Wednesday
    
    
      5
      2014-08-07
      2.30
      NaN
      NaN
      Thursday
    
    
      6
      2014-08-08
      3.40
      NaN
      NaN
      Friday
    
    
      7
      2014-08-09
      2.50
      NaN
      NaN
      Saturday
    
    
      8
      2014-08-10
      3.36
      28.616667
      8.516865
      Sunday
    
    
      9
      2014-08-11
      1.67
      13.116667
      7.854291
      Monday
    
    
      10
      2014-08-12
      1.42
      10.583333
      7.453052
      Tuesday
    
    
      11
      2014-08-13
      2.35
      17.416667
      7.411348
      Wednesday
    
    
      12
      2014-08-14
      0.90
      6.266667
      6.962963
      Thursday
    
    
      13
      2014-08-14
      0.90
      6.266667
      6.962963
      Thursday
    
    
      14
      2014-08-15
      1.78
      NaN
      NaN
      Friday
    
    
      15
      2014-08-16
      0.65
      NaN
      NaN
      Saturday
    
    
      16
      2014-08-17
      1.34
      NaN
      NaN
      Sunday
    
    
      17
      2014-08-18
      0.65
      NaN
      NaN
      Monday
    
    
      18
      2014-08-19
      1.37
      NaN
      NaN
      Tuesday
    
    
      19
      2014-08-19
      1.37
      NaN
      NaN
      Tuesday
    
    
      20
      2014-08-20
      1.37
      9.550000
      6.970803
      Wednesday
    
    
      21
      2014-08-21
      1.37
      9.400000
      6.861314
      Thursday
    
    
      22
      2014-08-22
      1.37
      9.083333
      6.630170
      Friday
    
    
      23
      2014-08-23
      1.34
      NaN
      NaN
      Saturday
    
    
      24
      2014-08-24
      3.00
      NaN
      NaN
      Sunday
    
    
      25
      2014-08-25
      1.37
      8.916667
      6.508516
      Monday
    
    
      26
      2014-08-25
      1.34
      NaN
      NaN
      Monday
    
    
      27
      2014-08-26
      1.37
      10.500000
      7.664234
      Tuesday
    
    
      28
      2014-08-27
      2.50
      25.000000
      10.000000
      Wednesday
    
    
      29
      2014-08-28
      0.00
      NaN
      NaN
      Thursday
    
    
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      136
      2014-12-04
      0.00
      NaN
      NaN
      Thursday
    
    
      137
      2014-12-05
      0.00
      NaN
      NaN
      Friday
    
    
      138
      2014-12-06
      0.00
      NaN
      NaN
      Saturday
    
    
      139
      2014-12-07
      0.00
      NaN
      NaN
      Sunday
    
    
      140
      2014-12-08
      0.00
      NaN
      NaN
      Monday
    
    
      141
      2014-12-09
      2.93
      25.000000
      8.532423
      Tuesday
    
    
      142
      2014-12-10
      0.00
      NaN
      NaN
      Wednesday
    
    
      143
      2014-12-11
      0.00
      NaN
      NaN
      Thursday
    
    
      144
      2014-12-12
      1.37
      9.250000
      6.751825
      Friday
    
    
      145
      2014-12-13
      5.30
      43.600000
      8.226415
      Saturday
    
    
      146
      2014-12-13
      2.21
      18.983333
      8.589744
      Saturday
    
    
      147
      2014-12-14
      0.00
      NaN
      NaN
      Sunday
    
    
      148
      2014-12-15
      6.50
      58.716667
      9.033333
      Monday
    
    
      149
      2014-12-16
      0.00
      NaN
      NaN
      Tuesday
    
    
      150
      2014-12-17
      12.00
      99.000000
      8.250000
      Wednesday
    
    
      151
      2014-12-18
      0.00
      NaN
      NaN
      Thursday
    
    
      152
      2014-12-19
      3.10
      26.250000
      8.467742
      Friday
    
    
      153
      2014-12-20
      14.80
      135.000000
      9.121622
      Saturday
    
    
      154
      2014-12-21
      0.00
      NaN
      NaN
      Sunday
    
    
      155
      2014-12-22
      3.10
      28.000000
      9.032258
      Monday
    
    
      156
      2014-12-23
      5.63
      51.833333
      9.206631
      Tuesday
    
    
      157
      2014-12-24
      3.05
      25.133333
      8.240437
      Wednesday
    
    
      158
      2014-12-25
      14.80
      140.000000
      9.459459
      Thursday
    
    
      159
      2014-12-26
      5.20
      NaN
      NaN
      Friday
    
    
      160
      2014-12-27
      0.00
      NaN
      NaN
      Saturday
    
    
      161
      2014-12-28
      0.00
      NaN
      NaN
      Sunday
    
    
      162
      2014-12-29
      0.00
      NaN
      NaN
      Monday
    
    
      163
      2014-12-30
      3.15
      22.166667
      7.037037
      Tuesday
    
    
      164
      2014-12-31
      0.00
      NaN
      NaN
      Wednesday
    
    
      165
      2015-01-01
      26.20
      260.716667
      9.951018
      Thursday
    
  

166 rows × 5 columns

we can also use drop to drop a specific row by specifying the 0 axis



In [32]:

    
tempdf = pd.DataFrame(np.arange(4).reshape(2,2))
tempdf



In [33]:

    
tempdf.drop(1,axis=0)

we Already saw how to create a new column, we can also create a new row using the append method. This takes in a data frame or Series and appends it to the end of the data frame.



In [34]:

    
tempdf.append(pd.Series([4,5]), ignore_index=True)



In [35]:

    
df.head()









    Out[35]:






  
    
      
      Date
      Miles
      Seconds
      Minutes
      Min_per_mile
      Day_of_week
    
  
  
    
      0
      2014-08-02
      1.70
      NaN
      NaN
      NaN
      Saturday
    
    
      1
      2014-08-03
      1.40
      NaN
      NaN
      NaN
      Sunday
    
    
      2
      2014-08-04
      2.10
      NaN
      NaN
      NaN
      Monday
    
    
      3
      2014-08-05
      0.00
      NaN
      NaN
      NaN
      Tuesday
    
    
      4
      2014-08-06
      2.38
      NaN
      NaN
      NaN
      Wednesday

We can also pop out a column which will remove it from a data frame and return the Series. You'll see that it happens in place.



In [36]:

    
df.pop('Seconds')









    Out[36]:





0        NaN
1        NaN
2        NaN
3        NaN
4        NaN
5        NaN
6        NaN
7        NaN
8       1717
9        787
10       635
11      1045
12       376
13       376
14       NaN
15       NaN
16       NaN
17       NaN
18       NaN
19       NaN
20       573
21       564
22       545
23       NaN
24       NaN
25       535
26       NaN
27       630
28      1500
29       NaN
       ...  
136      NaN
137      NaN
138      NaN
139      NaN
140      NaN
141     1500
142      NaN
143      NaN
144      555
145     2616
146     1139
147      NaN
148     3523
149      NaN
150     5940
151      NaN
152     1575
153     8100
154      NaN
155     1680
156     3110
157     1508
158     8400
159      NaN
160      NaN
161      NaN
162      NaN
163     1330
164      NaN
165    15643
Name: Seconds, dtype: float64



In [37]:

    
df.head()









    Out[37]:






  
    
      
      Date
      Miles
      Minutes
      Min_per_mile
      Day_of_week
    
  
  
    
      0
      2014-08-02
      1.70
      NaN
      NaN
      Saturday
    
    
      1
      2014-08-03
      1.40
      NaN
      NaN
      Sunday
    
    
      2
      2014-08-04
      2.10
      NaN
      NaN
      Monday
    
    
      3
      2014-08-05
      0.00
      NaN
      NaN
      Tuesday
    
    
      4
      2014-08-06
      2.38
      NaN
      NaN
      Wednesday

Now we've made our dataset a bit more manageable. We've kind of just got the basics of what we need to perform some groupwise analysis.

Now at this point we're going to do some groupings. This is an extremely powerful part of pandas and one that you'll use all the time.

pandas follows the the Split-Apply-Combine style of data analysis.

Many data analysis problems involve the application of a split-apply-combine strategy, where you break up a big problem into manageable pieces, operate on each piece independently and then put all the pieces back together.

Hadley Wickhan from Rice University: http://www.jstatsoft.org/v40/i01/paper

Since we're going to want to check things in groups. What I'm going to do is try to analyze each day of the week to see if there are any differences in the types of running that I do on those days.

we'll start by grouping or data set on those weekdays. Basically creating a dictionary of the data where the key is the weekday and the value is the dataframe of all those values.

First let's do this the hard way....



In [39]:

    
for dow in df.Day_of_week.unique():
    print(dow)
    print(df[df.Day_of_week == dow])
    break









    



Saturday
          Date  Miles     Minutes  Min_per_mile Day_of_week
0   2014-08-02   1.70         NaN           NaN    Saturday
7   2014-08-09   2.50         NaN           NaN    Saturday
15  2014-08-16   0.65         NaN           NaN    Saturday
23  2014-08-23   1.34         NaN           NaN    Saturday
31  2014-08-30   0.00         NaN           NaN    Saturday
38  2014-09-06   0.00         NaN           NaN    Saturday
45  2014-09-13   0.00         NaN           NaN    Saturday
52  2014-09-20   0.00         NaN           NaN    Saturday
59  2014-09-27   0.00         NaN           NaN    Saturday
67  2014-10-04   4.63   40.000000      8.639309    Saturday
68  2014-10-04   1.87         NaN           NaN    Saturday
69  2014-10-04   4.08   33.000000      8.088235    Saturday
76  2014-10-11   0.00         NaN           NaN    Saturday
83  2014-10-18   1.50    9.666667      6.444444    Saturday
84  2014-10-18   1.50         NaN           NaN    Saturday
85  2014-10-18   1.37         NaN           NaN    Saturday
92  2014-10-25   2.00         NaN           NaN    Saturday
99  2014-11-01   0.00         NaN           NaN    Saturday
108 2014-11-08   0.00         NaN           NaN    Saturday
117 2014-11-15   0.00         NaN           NaN    Saturday
124 2014-11-22   0.00         NaN           NaN    Saturday
131 2014-11-29   0.00         NaN           NaN    Saturday
138 2014-12-06   0.00         NaN           NaN    Saturday
145 2014-12-13   5.30   43.600000      8.226415    Saturday
146 2014-12-13   2.21   18.983333      8.589744    Saturday
153 2014-12-20  14.80  135.000000      9.121622    Saturday
160 2014-12-27   0.00         NaN           NaN    Saturday

This is clearly an ugly way to do this and pandas provides a much more simple way of approaching this problem. by creating a groupby object.

But first I'm going to filter out our zero values because they'll throw off our analysis.



In [40]:

    
df['Miles'] = df.Miles[df.Miles > 0]



In [42]:

    
dows = df.groupby('Day_of_week')
print(dows)









    



<pandas.core.groupby.DataFrameGroupBy object at 0x107cc0b50>

We can get the size of each one by using the size command. This basically tells us how many items are in each category.



In [43]:

    
dows.size()









    Out[43]:





Day_of_week
Friday       23
Monday       25
Saturday     27
Sunday       23
Thursday     23
Tuesday      23
Wednesday    22
dtype: int64



In [44]:

    
dows.count()









    Out[44]:






  
    
      
      Date
      Miles
      Minutes
      Min_per_mile
    
    
      Day_of_week
      
      
      
      
    
  
  
    
      Friday
      23
      15
      8
      8
    
    
      Monday
      25
      14
      10
      10
    
    
      Saturday
      27
      14
      6
      6
    
    
      Sunday
      23
      11
      8
      8
    
    
      Thursday
      23
      8
      5
      5
    
    
      Tuesday
      23
      11
      8
      8
    
    
      Wednesday
      22
      8
      7
      7

Now we have our groups and we can start doing groupwise analysis, now what does that mean?

It means we can start answering questions like what is the average speed per weekday or what is the total miles run per weekday?



In [45]:

    
dows.mean()









    Out[45]:






  
    
      
      Miles
      Minutes
      Min_per_mile
    
    
      Day_of_week
      
      
      
    
  
  
    
      Friday
      2.786000
      24.308333
      7.747657
    
    
      Monday
      2.607143
      22.243333
      7.463291
    
    
      Saturday
      3.246429
      46.708333
      8.184961
    
    
      Sunday
      2.422727
      19.762500
      7.463840
    
    
      Thursday
      6.315000
      84.530000
      8.039543
    
    
      Tuesday
      2.428182
      21.770833
      7.659706
    
    
      Wednesday
      3.315000
      28.021429
      7.829348



In [46]:

    
dows.sum()









    Out[46]:






  
    
      
      Miles
      Minutes
      Min_per_mile
    
    
      Day_of_week
      
      
      
    
  
  
    
      Friday
      41.79
      194.466667
      61.981259
    
    
      Monday
      36.50
      222.433333
      74.632908
    
    
      Saturday
      45.45
      280.250000
      49.109769
    
    
      Sunday
      26.65
      158.100000
      59.710723
    
    
      Thursday
      50.52
      422.650000
      40.197717
    
    
      Tuesday
      26.71
      174.166667
      61.277645
    
    
      Wednesday
      26.52
      196.150000
      54.805434

It might be interesting to see the total sum of the amount of runs to try and see any outliers simply because Thursday, Friday, Saturday are close in distances, relatively, but not so much in speed.

We also get access to a lot of summary statistics from here that we can get from the groups.



In [47]:

    
dows.describe()









    Out[47]:






  
    
      
      
      Miles
      Min_per_mile
      Minutes
    
    
      Day_of_week
      
      
      
      
    
  
  
    
      Friday
      count
      15.000000
      8.000000
      8.000000
    
    
      mean
      2.786000
      7.747657
      24.308333
    
    
      std
      1.510718
      0.844590
      15.376198
    
    
      min
      1.200000
      6.630170
      9.083333
    
    
      25%
      1.370000
      7.025547
      9.625000
    
    
      50%
      3.000000
      7.803279
      23.800000
    
    
      75%
      3.250000
      8.523306
      30.550000
    
    
      max
      5.630000
      8.718176
      49.083333
    
    
      Monday
      count
      14.000000
      10.000000
      10.000000
    
    
      mean
      2.607143
      7.463291
      22.243333
    
    
      std
      1.688538
      1.037698
      16.318699
    
    
      min
      0.650000
      6.000000
      8.600000
    
    
      25%
      1.402500
      6.677835
      9.637500
    
    
      50%
      1.885000
      7.520672
      17.516667
    
    
      75%
      3.087500
      7.815719
      26.795833
    
    
      max
      6.500000
      9.033333
      58.716667
    
    
      Saturday
      count
      14.000000
      6.000000
      6.000000
    
    
      mean
      3.246429
      8.184961
      46.708333
    
    
      std
      3.592504
      0.926076
      45.119311
    
    
      min
      0.650000
      6.444444
      9.666667
    
    
      25%
      1.500000
      8.122780
      22.487500
    
    
      50%
      1.935000
      8.408079
      36.500000
    
    
      75%
      3.685000
      8.626918
      42.700000
    
    
      max
      14.800000
      9.121622
      135.000000
    
    
      Sunday
      count
      11.000000
      8.000000
      8.000000
    
    
      mean
      2.422727
      7.463840
      19.762500
    
    
      std
      1.000241
      0.578829
      8.282836
    
    
      min
      1.340000
      6.362530
      8.716667
    
    
      25%
      1.450000
      7.408470
      11.300000
    
    
      50%
      3.000000
      7.474317
      22.591667
    
    
      75%
      3.050000
      7.533788
      24.116667
    
    
      max
      4.030000
      8.516865
      30.366667
    
    
      Thursday
      count
      8.000000
      5.000000
      5.000000
    
    
      mean
      6.315000
      8.039543
      84.530000
    
    
      std
      9.300811
      1.531027
      114.032318
    
    
      min
      0.900000
      6.861314
      6.266667
    
    
      25%
      0.975000
      6.962963
      6.266667
    
    
      50%
      1.835000
      6.962963
      9.400000
    
    
      75%
      5.987500
      9.459459
      140.000000
    
    
      max
      26.200000
      9.951018
      260.716667
    
    
      Tuesday
      count
      11.000000
      8.000000
      8.000000
    
    
      mean
      2.428182
      7.659706
      21.770833
    
    
      std
      1.321990
      0.830158
      13.777549
    
    
      min
      1.370000
      6.630170
      9.083333
    
    
      25%
      1.370000
      7.218276
      10.562500
    
    
      50%
      2.000000
      7.464231
      22.183333
    
    
      75%
      3.050000
      7.881281
      23.350000
    
    
      max
      5.630000
      9.206631
      51.833333
    
    
      Wednesday
      count
      8.000000
      7.000000
      7.000000
    
    
      mean
      3.315000
      7.829348
      28.021429
    
    
      std
      3.562291
      1.142713
      32.039394
    
    
      min
      1.370000
      6.532847
      8.950000
    
    
      25%
      1.467500
      7.185401
      10.325000
    
    
      50%
      2.365000
      7.411348
      17.416667
    
    
      75%
      2.637500
      8.245219
      25.066667
    
    
      max
      12.000000
      10.000000
      99.000000



In [48]:

    
df.groupby('Day_of_week').mean()









    Out[48]:






  
    
      
      Miles
      Minutes
      Min_per_mile
    
    
      Day_of_week
      
      
      
    
  
  
    
      Friday
      2.786000
      24.308333
      7.747657
    
    
      Monday
      2.607143
      22.243333
      7.463291
    
    
      Saturday
      3.246429
      46.708333
      8.184961
    
    
      Sunday
      2.422727
      19.762500
      7.463840
    
    
      Thursday
      6.315000
      84.530000
      8.039543
    
    
      Tuesday
      2.428182
      21.770833
      7.659706
    
    
      Wednesday
      3.315000
      28.021429
      7.829348



In [49]:

    
df.groupby('Day_of_week').std()









    Out[49]:






  
    
      
      Miles
      Minutes
      Min_per_mile
    
    
      Day_of_week
      
      
      
    
  
  
    
      Friday
      1.510718
      15.376198
      0.844590
    
    
      Monday
      1.688538
      16.318699
      1.037698
    
    
      Saturday
      3.592504
      45.119311
      0.926076
    
    
      Sunday
      1.000241
      8.282836
      0.578829
    
    
      Thursday
      9.300811
      114.032318
      1.531027
    
    
      Tuesday
      1.321990
      13.777549
      0.830158
    
    
      Wednesday
      3.562291
      32.039394
      1.142713

iterating through the groups is also very straightforward



In [51]:

    
for name, group in dows:
    print(name)
    print(group)









    



Friday
          Date  Miles    Minutes  Min_per_mile Day_of_week
6   2014-08-08   3.40        NaN           NaN      Friday
14  2014-08-15   1.78        NaN           NaN      Friday
22  2014-08-22   1.37   9.083333      6.630170      Friday
30  2014-08-29   5.00  43.450000      8.690000      Friday
37  2014-09-05   3.05  23.833333      7.814208      Friday
44  2014-09-12   3.05  23.766667      7.792350      Friday
51  2014-09-19   1.37   9.750000      7.116788      Friday
58  2014-09-26    NaN        NaN           NaN      Friday
66  2014-10-03    NaN        NaN           NaN      Friday
75  2014-10-10    NaN        NaN           NaN      Friday
82  2014-10-17    NaN        NaN           NaN      Friday
91  2014-10-24    NaN        NaN           NaN      Friday
98  2014-10-31   2.01        NaN           NaN      Friday
105 2014-11-07   5.63  49.083333      8.718176      Friday
106 2014-11-07   1.26        NaN           NaN      Friday
107 2014-11-07   1.20        NaN           NaN      Friday
116 2014-11-14    NaN        NaN           NaN      Friday
123 2014-11-21    NaN        NaN           NaN      Friday
130 2014-11-28   3.00        NaN           NaN      Friday
137 2014-12-05    NaN        NaN           NaN      Friday
144 2014-12-12   1.37   9.250000      6.751825      Friday
152 2014-12-19   3.10  26.250000      8.467742      Friday
159 2014-12-26   5.20        NaN           NaN      Friday
Monday
          Date  Miles    Minutes  Min_per_mile Day_of_week
2   2014-08-04   2.10        NaN           NaN      Monday
9   2014-08-11   1.67  13.116667      7.854291      Monday
17  2014-08-18   0.65        NaN           NaN      Monday
25  2014-08-25   1.37   8.916667      6.508516      Monday
26  2014-08-25   1.34        NaN           NaN      Monday
33  2014-09-01    NaN        NaN           NaN      Monday
40  2014-09-08   3.05  23.183333      7.601093      Monday
47  2014-09-15    NaN        NaN           NaN      Monday
54  2014-09-22    NaN        NaN           NaN      Monday
62  2014-09-29   3.05  21.916667      7.185792      Monday
71  2014-10-06    NaN        NaN           NaN      Monday
78  2014-10-13    NaN        NaN           NaN      Monday
87  2014-10-20   1.37   8.600000      6.277372      Monday
94  2014-10-27    NaN        NaN           NaN      Monday
101 2014-11-03   5.30  39.433333      7.440252      Monday
110 2014-11-10   1.50   9.000000      6.000000      Monday
111 2014-11-10   1.50  11.550000      7.700000      Monday
112 2014-11-10   4.00        NaN           NaN      Monday
119 2014-11-17    NaN        NaN           NaN      Monday
126 2014-11-24    NaN        NaN           NaN      Monday
133 2014-12-01    NaN        NaN           NaN      Monday
140 2014-12-08    NaN        NaN           NaN      Monday
148 2014-12-15   6.50  58.716667      9.033333      Monday
155 2014-12-22   3.10  28.000000      9.032258      Monday
162 2014-12-29    NaN        NaN           NaN      Monday
Saturday
          Date  Miles     Minutes  Min_per_mile Day_of_week
0   2014-08-02   1.70         NaN           NaN    Saturday
7   2014-08-09   2.50         NaN           NaN    Saturday
15  2014-08-16   0.65         NaN           NaN    Saturday
23  2014-08-23   1.34         NaN           NaN    Saturday
31  2014-08-30    NaN         NaN           NaN    Saturday
38  2014-09-06    NaN         NaN           NaN    Saturday
45  2014-09-13    NaN         NaN           NaN    Saturday
52  2014-09-20    NaN         NaN           NaN    Saturday
59  2014-09-27    NaN         NaN           NaN    Saturday
67  2014-10-04   4.63   40.000000      8.639309    Saturday
68  2014-10-04   1.87         NaN           NaN    Saturday
69  2014-10-04   4.08   33.000000      8.088235    Saturday
76  2014-10-11    NaN         NaN           NaN    Saturday
83  2014-10-18   1.50    9.666667      6.444444    Saturday
84  2014-10-18   1.50         NaN           NaN    Saturday
85  2014-10-18   1.37         NaN           NaN    Saturday
92  2014-10-25   2.00         NaN           NaN    Saturday
99  2014-11-01    NaN         NaN           NaN    Saturday
108 2014-11-08    NaN         NaN           NaN    Saturday
117 2014-11-15    NaN         NaN           NaN    Saturday
124 2014-11-22    NaN         NaN           NaN    Saturday
131 2014-11-29    NaN         NaN           NaN    Saturday
138 2014-12-06    NaN         NaN           NaN    Saturday
145 2014-12-13   5.30   43.600000      8.226415    Saturday
146 2014-12-13   2.21   18.983333      8.589744    Saturday
153 2014-12-20  14.80  135.000000      9.121622    Saturday
160 2014-12-27    NaN         NaN           NaN    Saturday
Sunday
          Date  Miles    Minutes  Min_per_mile Day_of_week
1   2014-08-03   1.40        NaN           NaN      Sunday
8   2014-08-10   3.36  28.616667      8.516865      Sunday
16  2014-08-17   1.34        NaN           NaN      Sunday
24  2014-08-24   3.00        NaN           NaN      Sunday
32  2014-08-31    NaN        NaN           NaN      Sunday
39  2014-09-07    NaN        NaN           NaN      Sunday
46  2014-09-14    NaN        NaN           NaN      Sunday
53  2014-09-21   4.03  30.366667      7.535153      Sunday
60  2014-09-28   1.50  11.300000      7.533333      Sunday
61  2014-09-28   1.50  11.300000      7.533333      Sunday
70  2014-10-05    NaN        NaN           NaN      Sunday
77  2014-10-12   3.05  22.616667      7.415301      Sunday
86  2014-10-19    NaN        NaN           NaN      Sunday
93  2014-10-26    NaN        NaN           NaN      Sunday
100 2014-11-02   1.37   8.716667      6.362530      Sunday
109 2014-11-09   3.05  22.600000      7.409836      Sunday
118 2014-11-16    NaN        NaN           NaN      Sunday
125 2014-11-23   3.05  22.583333      7.404372      Sunday
132 2014-11-30    NaN        NaN           NaN      Sunday
139 2014-12-07    NaN        NaN           NaN      Sunday
147 2014-12-14    NaN        NaN           NaN      Sunday
154 2014-12-21    NaN        NaN           NaN      Sunday
161 2014-12-28    NaN        NaN           NaN      Sunday
Thursday
          Date  Miles     Minutes  Min_per_mile Day_of_week
5   2014-08-07   2.30         NaN           NaN    Thursday
12  2014-08-14   0.90    6.266667      6.962963    Thursday
13  2014-08-14   0.90    6.266667      6.962963    Thursday
21  2014-08-21   1.37    9.400000      6.861314    Thursday
29  2014-08-28    NaN         NaN           NaN    Thursday
36  2014-09-04   3.05         NaN           NaN    Thursday
43  2014-09-11    NaN         NaN           NaN    Thursday
50  2014-09-18    NaN         NaN           NaN    Thursday
57  2014-09-25    NaN         NaN           NaN    Thursday
65  2014-10-02    NaN         NaN           NaN    Thursday
74  2014-10-09    NaN         NaN           NaN    Thursday
81  2014-10-16    NaN         NaN           NaN    Thursday
90  2014-10-23    NaN         NaN           NaN    Thursday
97  2014-10-30    NaN         NaN           NaN    Thursday
104 2014-11-06    NaN         NaN           NaN    Thursday
115 2014-11-13    NaN         NaN           NaN    Thursday
122 2014-11-20    NaN         NaN           NaN    Thursday
129 2014-11-27   1.00         NaN           NaN    Thursday
136 2014-12-04    NaN         NaN           NaN    Thursday
143 2014-12-11    NaN         NaN           NaN    Thursday
151 2014-12-18    NaN         NaN           NaN    Thursday
158 2014-12-25  14.80  140.000000      9.459459    Thursday
165 2015-01-01  26.20  260.716667      9.951018    Thursday
Tuesday
          Date  Miles    Minutes  Min_per_mile Day_of_week
3   2014-08-05    NaN        NaN           NaN     Tuesday
10  2014-08-12   1.42  10.583333      7.453052     Tuesday
18  2014-08-19   1.37        NaN           NaN     Tuesday
19  2014-08-19   1.37        NaN           NaN     Tuesday
27  2014-08-26   1.37  10.500000      7.664234     Tuesday
34  2014-09-02   2.00        NaN           NaN     Tuesday
41  2014-09-09    NaN        NaN           NaN     Tuesday
48  2014-09-16   3.05  22.800000      7.475410     Tuesday
55  2014-09-23    NaN        NaN           NaN     Tuesday
63  2014-09-30    NaN        NaN           NaN     Tuesday
72  2014-10-07    NaN        NaN           NaN     Tuesday
79  2014-10-14   1.37   9.083333      6.630170     Tuesday
88  2014-10-21    NaN        NaN           NaN     Tuesday
95  2014-10-28    NaN        NaN           NaN     Tuesday
102 2014-11-04   3.05  22.200000      7.278689     Tuesday
113 2014-11-11    NaN        NaN           NaN     Tuesday
120 2014-11-18    NaN        NaN           NaN     Tuesday
127 2014-11-25    NaN        NaN           NaN     Tuesday
134 2014-12-02    NaN        NaN           NaN     Tuesday
141 2014-12-09   2.93  25.000000      8.532423     Tuesday
149 2014-12-16    NaN        NaN           NaN     Tuesday
156 2014-12-23   5.63  51.833333      9.206631     Tuesday
163 2014-12-30   3.15  22.166667      7.037037     Tuesday
Wednesday
          Date  Miles    Minutes  Min_per_mile Day_of_week
4   2014-08-06   2.38        NaN           NaN   Wednesday
11  2014-08-13   2.35  17.416667      7.411348   Wednesday
20  2014-08-20   1.37   9.550000      6.970803   Wednesday
28  2014-08-27   2.50  25.000000     10.000000   Wednesday
35  2014-09-03    NaN        NaN           NaN   Wednesday
42  2014-09-10   1.37   8.950000      6.532847   Wednesday
49  2014-09-17   1.50  11.100000      7.400000   Wednesday
56  2014-09-24    NaN        NaN           NaN   Wednesday
64  2014-10-01    NaN        NaN           NaN   Wednesday
73  2014-10-08    NaN        NaN           NaN   Wednesday
80  2014-10-15    NaN        NaN           NaN   Wednesday
89  2014-10-22    NaN        NaN           NaN   Wednesday
96  2014-10-29    NaN        NaN           NaN   Wednesday
103 2014-11-05    NaN        NaN           NaN   Wednesday
114 2014-11-12    NaN        NaN           NaN   Wednesday
121 2014-11-19    NaN        NaN           NaN   Wednesday
128 2014-11-26    NaN        NaN           NaN   Wednesday
135 2014-12-03    NaN        NaN           NaN   Wednesday
142 2014-12-10    NaN        NaN           NaN   Wednesday
150 2014-12-17  12.00  99.000000      8.250000   Wednesday
157 2014-12-24   3.05  25.133333      8.240437   Wednesday
164 2014-12-31    NaN        NaN           NaN   Wednesday

you can get specific groups by using the get_group method.



In [52]:

    
dows.get_group('Friday')









    Out[52]:






  
    
      
      Date
      Miles
      Min_per_mile
      Minutes
    
  
  
    
      6
      2014-08-08
      3.40
      NaN
      NaN
    
    
      14
      2014-08-15
      1.78
      NaN
      NaN
    
    
      22
      2014-08-22
      1.37
      6.630170
      9.083333
    
    
      30
      2014-08-29
      5.00
      8.690000
      43.450000
    
    
      37
      2014-09-05
      3.05
      7.814208
      23.833333
    
    
      44
      2014-09-12
      3.05
      7.792350
      23.766667
    
    
      51
      2014-09-19
      1.37
      7.116788
      9.750000
    
    
      58
      2014-09-26
      NaN
      NaN
      NaN
    
    
      66
      2014-10-03
      NaN
      NaN
      NaN
    
    
      75
      2014-10-10
      NaN
      NaN
      NaN
    
    
      82
      2014-10-17
      NaN
      NaN
      NaN
    
    
      91
      2014-10-24
      NaN
      NaN
      NaN
    
    
      98
      2014-10-31
      2.01
      NaN
      NaN
    
    
      105
      2014-11-07
      5.63
      8.718176
      49.083333
    
    
      106
      2014-11-07
      1.26
      NaN
      NaN
    
    
      107
      2014-11-07
      1.20
      NaN
      NaN
    
    
      116
      2014-11-14
      NaN
      NaN
      NaN
    
    
      123
      2014-11-21
      NaN
      NaN
      NaN
    
    
      130
      2014-11-28
      3.00
      NaN
      NaN
    
    
      137
      2014-12-05
      NaN
      NaN
      NaN
    
    
      144
      2014-12-12
      1.37
      6.751825
      9.250000
    
    
      152
      2014-12-19
      3.10
      8.467742
      26.250000
    
    
      159
      2014-12-26
      5.20
      NaN
      NaN

We can use an aggregation command to perform an operation to get all the counts for each data frame.



In [53]:

    
dows.agg(lambda x: len(x))['Miles']









    Out[53]:





Day_of_week
Friday       23
Monday       25
Saturday     27
Sunday       23
Thursday     23
Tuesday      23
Wednesday    22
Name: Miles, dtype: float64

another way to do this would be to add a count column to our data frame, then sum up each column



In [54]:

    
df['Count'] = 1
df.head(10)









    Out[54]:






  
    
      
      Date
      Miles
      Minutes
      Min_per_mile
      Day_of_week
      Count
    
  
  
    
      0
      2014-08-02
      1.70
      NaN
      NaN
      Saturday
      1
    
    
      1
      2014-08-03
      1.40
      NaN
      NaN
      Sunday
      1
    
    
      2
      2014-08-04
      2.10
      NaN
      NaN
      Monday
      1
    
    
      3
      2014-08-05
      NaN
      NaN
      NaN
      Tuesday
      1
    
    
      4
      2014-08-06
      2.38
      NaN
      NaN
      Wednesday
      1
    
    
      5
      2014-08-07
      2.30
      NaN
      NaN
      Thursday
      1
    
    
      6
      2014-08-08
      3.40
      NaN
      NaN
      Friday
      1
    
    
      7
      2014-08-09
      2.50
      NaN
      NaN
      Saturday
      1
    
    
      8
      2014-08-10
      3.36
      28.616667
      8.516865
      Sunday
      1
    
    
      9
      2014-08-11
      1.67
      13.116667
      7.854291
      Monday
      1



In [55]:

    
df.groupby('Day_of_week').sum()









    Out[55]:






  
    
      
      Miles
      Minutes
      Min_per_mile
      Count
    
    
      Day_of_week
      
      
      
      
    
  
  
    
      Friday
      41.79
      194.466667
      61.981259
      23
    
    
      Monday
      36.50
      222.433333
      74.632908
      25
    
    
      Saturday
      45.45
      280.250000
      49.109769
      27
    
    
      Sunday
      26.65
      158.100000
      59.710723
      23
    
    
      Thursday
      50.52
      422.650000
      40.197717
      23
    
    
      Tuesday
      26.71
      174.166667
      61.277645
      23
    
    
      Wednesday
      26.52
      196.150000
      54.805434
      22

That about covers everything that I wanted to regarding pandas basics for grouping, filtering and that sort of thing. This is really the basics and likely a lot of what you'll be using on a regular basis. In the next section we will dive a bit deeper into more intensive analysis or more detailed function usage.



In [ ]:

	Date	Miles	Time	Seconds	Minutes	Min_per_mile
0	2014-08-02	1.70	NaN	NaN	NaN	NaN
1	2014-08-03	1.40	NaN	NaN	NaN	NaN
2	2014-08-04	2.10	NaN	NaN	NaN	NaN
3	2014-08-05	0.00	NaN	NaN	NaN	NaN
4	2014-08-06	2.38	NaN	NaN	NaN	NaN

	Date	Miles	Time	Seconds	Minutes	Min_per_mile	Day_of_week
0	2014-08-02	1.70	NaN	NaN	NaN	NaN	Saturday
1	2014-08-03	1.40	NaN	NaN	NaN	NaN	Sunday
2	2014-08-04	2.10	NaN	NaN	NaN	NaN	Monday
3	2014-08-05	0.00	NaN	NaN	NaN	NaN	Tuesday
4	2014-08-06	2.38	NaN	NaN	NaN	NaN	Wednesday
5	2014-08-07	2.30	NaN	NaN	NaN	NaN	Thursday
6	2014-08-08	3.40	NaN	NaN	NaN	NaN	Friday
7	2014-08-09	2.50	NaN	NaN	NaN	NaN	Saturday
8	2014-08-10	3.36	0:28:37	1717	28.616667	8.516865	Sunday
9	2014-08-11	1.67	0:13:07	787	13.116667	7.854291	Monday

	Date	Miles	Minutes	Min_per_mile
Day_of_week
Friday	23	15	8	8
Monday	25	14	10	10
Saturday	27	14	6	6
Sunday	23	11	8	8
Thursday	23	8	5	5
Tuesday	23	11	8	8
Wednesday	22	8	7	7

	Miles	Minutes	Min_per_mile
Day_of_week
Friday	2.786000	24.308333	7.747657
Monday	2.607143	22.243333	7.463291
Saturday	3.246429	46.708333	8.184961
Sunday	2.422727	19.762500	7.463840
Thursday	6.315000	84.530000	8.039543
Tuesday	2.428182	21.770833	7.659706
Wednesday	3.315000	28.021429	7.829348

	Miles	Minutes	Min_per_mile
Day_of_week
Friday	41.79	194.466667	61.981259
Monday	36.50	222.433333	74.632908
Saturday	45.45	280.250000	49.109769
Sunday	26.65	158.100000	59.710723
Thursday	50.52	422.650000	40.197717
Tuesday	26.71	174.166667	61.277645
Wednesday	26.52	196.150000	54.805434

		Miles	Min_per_mile	Minutes
Day_of_week
Friday	count	15.000000	8.000000	8.000000
	mean	2.786000	7.747657	24.308333
	std	1.510718	0.844590	15.376198
	min	1.200000	6.630170	9.083333
	25%	1.370000	7.025547	9.625000
	50%	3.000000	7.803279	23.800000
	75%	3.250000	8.523306	30.550000
	max	5.630000	8.718176	49.083333
Monday	count	14.000000	10.000000	10.000000
	mean	2.607143	7.463291	22.243333
	std	1.688538	1.037698	16.318699
	min	0.650000	6.000000	8.600000
	25%	1.402500	6.677835	9.637500
	50%	1.885000	7.520672	17.516667
	75%	3.087500	7.815719	26.795833
	max	6.500000	9.033333	58.716667
Saturday	count	14.000000	6.000000	6.000000
	mean	3.246429	8.184961	46.708333
	std	3.592504	0.926076	45.119311
	min	0.650000	6.444444	9.666667
	25%	1.500000	8.122780	22.487500
	50%	1.935000	8.408079	36.500000
	75%	3.685000	8.626918	42.700000
	max	14.800000	9.121622	135.000000
Sunday	count	11.000000	8.000000	8.000000
	mean	2.422727	7.463840	19.762500
	std	1.000241	0.578829	8.282836
	min	1.340000	6.362530	8.716667
	25%	1.450000	7.408470	11.300000
	50%	3.000000	7.474317	22.591667
	75%	3.050000	7.533788	24.116667
	max	4.030000	8.516865	30.366667
Thursday	count	8.000000	5.000000	5.000000
	mean	6.315000	8.039543	84.530000
	std	9.300811	1.531027	114.032318
	min	0.900000	6.861314	6.266667
	25%	0.975000	6.962963	6.266667
	50%	1.835000	6.962963	9.400000
	75%	5.987500	9.459459	140.000000
	max	26.200000	9.951018	260.716667
Tuesday	count	11.000000	8.000000	8.000000
	mean	2.428182	7.659706	21.770833
	std	1.321990	0.830158	13.777549
	min	1.370000	6.630170	9.083333
	25%	1.370000	7.218276	10.562500
	50%	2.000000	7.464231	22.183333
	75%	3.050000	7.881281	23.350000
	max	5.630000	9.206631	51.833333
Wednesday	count	8.000000	7.000000	7.000000
	mean	3.315000	7.829348	28.021429
	std	3.562291	1.142713	32.039394
	min	1.370000	6.532847	8.950000
	25%	1.467500	7.185401	10.325000
	50%	2.365000	7.411348	17.416667
	75%	2.637500	8.245219	25.066667
	max	12.000000	10.000000	99.000000

	Date	Miles	Min_per_mile	Minutes
6	2014-08-08	3.40	NaN	NaN
14	2014-08-15	1.78	NaN	NaN
22	2014-08-22	1.37	6.630170	9.083333
30	2014-08-29	5.00	8.690000	43.450000
37	2014-09-05	3.05	7.814208	23.833333
44	2014-09-12	3.05	7.792350	23.766667
51	2014-09-19	1.37	7.116788	9.750000
58	2014-09-26	NaN	NaN	NaN
66	2014-10-03	NaN	NaN	NaN
75	2014-10-10	NaN	NaN	NaN
82	2014-10-17	NaN	NaN	NaN
91	2014-10-24	NaN	NaN	NaN
98	2014-10-31	2.01	NaN	NaN
105	2014-11-07	5.63	8.718176	49.083333
106	2014-11-07	1.26	NaN	NaN
107	2014-11-07	1.20	NaN	NaN
116	2014-11-14	NaN	NaN	NaN
123	2014-11-21	NaN	NaN	NaN
130	2014-11-28	3.00	NaN	NaN
137	2014-12-05	NaN	NaN	NaN
144	2014-12-12	1.37	6.751825	9.250000
152	2014-12-19	3.10	8.467742	26.250000
159	2014-12-26	5.20	NaN	NaN